DBRS: A Density-Based Spatial Clustering Method with Random Sampling
نویسندگان
چکیده
When analyzing spatial databases or other datasets with spatial attributes, one frequently wants to cluster the data according to spatial attributes. In this paper, we describe a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters of varying densities, clusters which depend on non-spatial attributes, and approximate clusters in very large databases. DBRS achieves these results by repeatedly picking an unclassified point at random and examining its neighborhood. If the neighborhood is sparsely populated or the purity of the points in the neighborhood is too low, the point is classified as noise. Otherwise, if any point in the neighborhood is part of a known cluster, this neighborhood is joined to that cluster. If neither of these two possibilities applies, a new cluster is begun with this neighborhood. DBRS scales well on dense clusters. A heuristic is proposed for approximate clustering in very large databases. With this heuristic, the run time can be significantly reduced by assuming that a probabilistically controlled number of points are noise. A theoretical comparison of DBRS and DBSCAN, a well-known density-based algorithm, is given. Finally, DBRS is empirically compared with DBSCAN, CLARANS, and k-means on synthetic and real data sets.
منابع مشابه
Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators
In this paper, we propose a new spatial clustering method, called DBRS+, which aims to cluster spatial data in the presence of both obstacles and facilitators. It can handle datasets with intersected obstacles and facilitators. Without preprocessing, DBRS+ processes constraints during clustering. It can find clusters with arbitrary shapes and varying densities. DBRS+ has been empirically evalua...
متن کاملA Comparative Study of Two Density-Based Spatial Clustering Algorithms for Very Large Datasets
Spatial clustering is an active research area in spatial data mining with various methods reported. In this paper, we compare two density-based methods, DBSCAN and DBRS. First, we briefly describe the methods and then compare them from a theoretical view. Finally, we give an empirical comparison of the algorithms.
متن کاملتأثیر الگوی پراکنش درختان بر برآورد تراکم با روش نمونه برداری نزدیکترین فرد: مطالعات موردی در درختزارهای بنه زاگرس و تودههای شبیه سازی شده
Distance methods and their estimators of density may have biased measurements unless the studied stand of trees has a random spatial pattern. This study aimed at assessing the effect of spatial arrangement of wild pistachio trees on the results of density estimation by using the nearest individual sampling method in Zagros woodlands, Iran, and applying a correction factor based on the spatial p...
متن کاملSampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کامل